Chapter 3 Reproducability

3.1 Reproducing data from a published paper

Here i am showing you how i am able to reproduce results from a published paper
the data used in this assignment comes from (van der Voet et al. 2021)

library(tidyverse)
library(here)
library(readxl)
library(rbbt)
library(RColorBrewer)
offspring <- read_excel(here("data/CE.LIQ.FLOW.062_Tidydata.xlsx"), sheet = 1)

# we want to see if the data for the experimental conditions have been imported correctly
offspring %>% select(c("expType", "RawData", "compName", "compConcentration"))
## # A tibble: 360 x 4
##    expType    RawData compName                   compConcentration
##    <chr>        <dbl> <chr>                      <chr>            
##  1 experiment      44 2,6-diisopropylnaphthalene 4.99             
##  2 experiment      37 2,6-diisopropylnaphthalene 4.99             
##  3 experiment      45 2,6-diisopropylnaphthalene 4.99             
##  4 experiment      47 2,6-diisopropylnaphthalene 4.99             
##  5 experiment      41 2,6-diisopropylnaphthalene 4.99             
##  6 experiment      35 2,6-diisopropylnaphthalene 4.99             
##  7 experiment      41 2,6-diisopropylnaphthalene 4.99             
##  8 experiment      36 2,6-diisopropylnaphthalene 4.99             
##  9 experiment      40 2,6-diisopropylnaphthalene 4.99             
## 10 experiment      38 2,6-diisopropylnaphthalene 4.99             
## # ... with 350 more rows
# as we can see, the rawdata should have been an integer, the compname and expType should have been a factor and the compconcentration should have been a double. lets change that

offspring$RawData <- as.integer(offspring$RawData)
offspring$compName <- as.factor(offspring$compName)
offspring$expType <- as.factor(offspring$expType)

offspring_tidy <- offspring
offspring_tidy$compConcentration <- as.numeric(offspring_tidy$compConcentration)

# one of the values in compconcentration is accidentally classified as a character in excel and has now turned into a NA value, we will change this value manually.

character_placement <- which(is.na(offspring_tidy$compConcentration))
character_value <- offspring$compConcentration[character_placement] %>% str_replace(",", ".") %>% parse_number()
offspring_tidy$compConcentration[character_placement] <- character_value

# lets check one last time if the data types are correct.
offspring %>% select(c("RawData", "compName", "compConcentration"))
## # A tibble: 360 x 3
##    RawData compName                   compConcentration
##      <int> <fct>                      <chr>            
##  1      44 2,6-diisopropylnaphthalene 4.99             
##  2      37 2,6-diisopropylnaphthalene 4.99             
##  3      45 2,6-diisopropylnaphthalene 4.99             
##  4      47 2,6-diisopropylnaphthalene 4.99             
##  5      41 2,6-diisopropylnaphthalene 4.99             
##  6      35 2,6-diisopropylnaphthalene 4.99             
##  7      41 2,6-diisopropylnaphthalene 4.99             
##  8      36 2,6-diisopropylnaphthalene 4.99             
##  9      40 2,6-diisopropylnaphthalene 4.99             
## 10      38 2,6-diisopropylnaphthalene 4.99             
## # ... with 350 more rows
# they are so we can now use the data for further analysis
offspring_tidy %>%
  ggplot(aes(x = log10(compConcentration + 0.0001), y = RawData)) +
  geom_jitter(aes(shape = expType, colour = compName), width = .1) +
  labs(title = "Amount of offspring from C. elegans incubated in different substances",
       subtitle = "Experiment data from (van der Voet et al. 2021)",
       x = "Log 10 of compound concentration",
       y = "Amount of offspring per C. elegans",
       colour = "Compound name",
       shape = "Experiment type") +
  scale_shape_discrete(labels = c("Negative control", "Positive control", "Vehicle A control", "Experiment")) +
  scale_colour_brewer(palette = "Dark2") +
  theme_classic()

the positive control of this experiment is Ethanol and the negative control is no added substance.


to analyze this experiment I would follow these steps.
1. making a new column which shows which condition every worm is located in. (for example, group1 would consist of 2,6-diisopropylnaphthalene with a concentration of 4.99 nM, etc.)
2. checking normality for every condition.

NORMALLY DISTRIBUTED DATA:
3. perform ANOVA. with post-hoc tests and check if they differ from the control.
NOT NORMALLY DISTRIBUTED DATA:
3. perform kruskal - wallis test.

4. to visualize this difference, make a smoothed line graph for every the mean of every concentration per substance.
5. compare these graphs with each other.

normalized_value <- offspring_tidy %>% 
  group_by(compName) %>% filter(compName == "S-medium") %>%
  summarise(mean = mean(RawData, na.rm = T))

offspring_tidy <- offspring_tidy %>% mutate(normalized_offspring = 
                                              RawData/normalized_value$mean)


offspring_tidy %>%
  ggplot(aes(x = log10(compConcentration + 0.0001), y = normalized_offspring)) +
  geom_jitter(aes(shape = expType, colour = compName), width = .1) +
  labs(title = "Amount of offspring from C. elegans incubated in different substances",
       subtitle = "Experiment data from (van der Voet et al. 2021)",
       x = "Log 10 of compound concentration",
       y = "Normalized offspring amount by mean of negative control",
       colour = "Compound name",
       shape = "Experiment type") +
  scale_shape_discrete(labels = c("Negative control", "Positive control", "Vehicle A control", "Experiment")) +
  scale_colour_brewer(palette = "Dark2") +
  theme_classic()

We normalize the data so we can see the difference between the different substances more easily.

3.2 Checking reproducability for published papers.

in this assignment, this study (Strobl et al. 2020) will be graded on the criteria for reproducibility.
and this study (Brewer, Robey, and Unsworth 2021) will be graded on code readability and reproducibility.

3.2.1 Pesticide influence on consumption rate and survival for bees.

  • introduction of the paper

the use of pesticides is one of the main reasons of loss of biodiversity, and the combination of multiple pesticides could even make this worse. in this experiment it is investigated what the sublethal (food consumption) and the lethal (survival) effects of pesticides are on adult female solitary bees, Osmia bicornis.

to perform these tests, female solitary bees were divided into 4 groups:

– pesticide free (control)
– herbicide
– pesticide
– combined (both herbicide and pesticide)
their consumption rate and longevity were measured and the data from these two variables are used for analysis.

there is no significant difference in survival and consumption between te different groups. there is however a significant positive correlation between the consumption rate and the longevity of these bees.

  • transparancy criteria grading
transparancy
criteria
grading
study purpose TRUE
data
availability
FALSE
only part of the data is
available
data
location
at the beginning/
at the end
study
location
TRUE
materials/methods
author
review
location and email
are present at the top
ethics
statement
FALSE
funding
statement
TRUE
code
availability
TRUE

The part of the data that is available can be accessed through this directory: “data/insects-957898-supplementary.xlsx”

3.2.2 impact of analysis decisions for episodic memory and retrieval practices.

we will solely focus on the code of this paper to see:
– If the code can be understood easily.
– If I can reproduce one of the figures.
– If there are any bugs/flaws in the code.

the code is available in this website

the code has been copied to a new Rmd file in this repository under the name “_analysis_decisions_code.Rmd”
the data has been downloaded and is available in this repository under the name “data/AllDataRR.csv”

  • changes made:

– changed the directory in line 11 so it retrieved the data used from this study.
– installed the packages in line 19 and line 180.

  • first impression:

– (+) every test is in different chunks which makes readability easier.
– (+) clear comments on what is happening.
– (+) easy to understand code
– (-) chunks dont have names.
– (-) the individual results are far away from each other.
– (-) the same tests are set of tests are performed multiple times, making a function would make chances of mistakes less likely

  • what this code is trying to achieve

the first part of the code for this experiment is looking for the correlation between individual and different studies (line 24-174)
the second part of the code for this experiment is looking at a correlation between the retrieval practice effect and the EM ability with the help of a graph. there are 2 graphs, one where everything is mean centered and one where it isnt.

  • final judgement: (grading goes from 1-5(1 very hard/bad- 5 very easy/good))

– readability = 4
– reproducability = 5
– efficiency = 2

3.3 organisation of my files

to show my ability at organizing files heres an example of the file structure from one of my previous projects File structure from previous project

References

Brewer, Gene A., Alison Robey, and Nash Unsworth. 2021. “Discrepant Findings on the Relation Between Episodic Memory and Retrieval Practice: The Impact of Analysis Decisions.” Journal of Memory and Language 116 (February): 104185. https://doi.org/10.1016/j.jml.2020.104185.
Strobl, Verena, Domenic Camenzind, Angela Minnameyer, Stephanie Walker, Michael Eyer, Peter Neumann, and Lars Straub. 2020. “Positive Correlation Between Pesticide Consumption and Longevity in Solitary Bees: Are We Overlooking Fitness Trade-Offs?” Insects 11 (11): 819. https://doi.org/10.3390/insects11110819.
van der Voet, Monique, Marc Teunis, Johanna Louter-van de Haar, Nienke Stigter, Diksha Bhalla, Martijn Rooseboom, Kimberley E Wever, et al. 2021. “Towards a Reporting Guideline for Developmental and Reproductive Toxicology Testing in C. Elegans and Other Nematodes.” Toxicology Research 10 (6): 1202–10. https://doi.org/10.1093/toxres/tfab109.